Institut für Informatik Neuroinformatics Group Reinforcement Learning with Recurrent Neural Networks
نویسنده
چکیده
Controlling a high-dimensional dynamical system with continuous state and action spaces in a partially unknown environment like a gas turbine is a challenging problem. So far often hard coded rules based on experts’ knowledge and experience are used. Machine learning techniques, which comprise the field of reinforcement learning, are generally only applied to sub-problems. A reason for this is that most standard reinforcement learning approaches still fail to produce satisfactory results in those complex environments. Besides, they are rarely data-efficient, a fact which is crucial for most real-world applications, where the available amount of data is limited. In this thesis recurrent neural reinforcement learning approaches to identify and control dynamical systems in discrete time are presented. They form a novel connection between recurrent neural networks (RNN) and reinforcement learning (RL) techniques. Thereby, instead of focusing on algorithms, neural network architectures are put in the foreground. RNN are used as they allow for the identification of dynamical systems in form of high-dimensional, non-linear state space models. Also, they have shown to be very data-efficient. In addition, a proof is given for their universal approximation capability of open dynamical systems. Moreover, it is pointed out that they are, in contrast to an often cited statement, well able to capture long-term dependencies. As a first step towards reinforcement learning, it is shown that RNN can well map and reconstruct (partially observable) Markov decision processes. In doing so, the resulting inner state of the network can be used as a basis for standard RL algorithms. This so-called hybrid RNN approach is rather simple but showed good results for a couple of applications. The further developed recurrent control neural network combines system identification and determination of an optimal policy in one network. It does not only learn from data but also integrates prior knowledge into the modelling in form of architectural concepts. Furthermore, in contrast to most RL methods, it determines the optimal policy directly without making use of a value function. This distinguishes the approach also from other works on reinforcement learning with recurrent networks. The methods are tested on several standard benchmark problems. In addition, they are applied to different kinds of gas turbine simulations of industrial scale.
منابع مشابه
The “echo state” approach to analysing and training recurrent neural networks – with an Erratum note
The report introduces a constructive learning algorithm for recurrent neural networks, which modifies only the weights to output units in order to achieve the learning task. key words: recurrent neural networks, supervised learning Zusammenfassung. Der Report führt ein konstruktives Lernverfahren für rekurrente neuronale Netze ein, welches zum Erreichen des Lernzieles lediglich die Gewichte der...
متن کاملReinforcement Learning in Neural Networks: A Survey
In recent years, researches on reinforcement learning (RL) have focused on bridging the gap between adaptive optimal control and bio-inspired learning techniques. Neural network reinforcement learning (NNRL) is among the most popular algorithms in the RL framework. The advantage of using neural networks enables the RL to search for optimal policies more efficiently in several real-life applicat...
متن کاملReinforcement Learning in Neural Networks: A Survey
In recent years, researches on reinforcement learning (RL) have focused on bridging the gap between adaptive optimal control and bio-inspired learning techniques. Neural network reinforcement learning (NNRL) is among the most popular algorithms in the RL framework. The advantage of using neural networks enables the RL to search for optimal policies more efficiently in several real-life applicat...
متن کاملApplying Policy Iteration for Training Recurrent Neural Networks
Recurrent neural networks are often used for learning time-series data. Based on a few assumptions we model this learning task as a minimization problem of a nonlinear least-squares cost function. The special structure of the cost function allows us to build a connection to reinforcement learning. We exploit this connection and derive a convergent, policy iteration-based algorithm. Furthermore,...
متن کاملStable reinforcement learning with recurrent neural networks
In this paper, we present a technique for ensuring the stability of a large class of adaptively controlled systems. We combine IQC models of both the controlled system and the controller with a method of filtering control parameter updates to ensure stable behavior of the controlled system under adaptation of the controller. We present a specific application to a system that uses recurrent neur...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008